train cross-entropy loss
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.94)
- Information Technology > Artificial Intelligence > Vision (0.68)
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Choi, Dami, Xin, Derrick, Dadkhahi, Hamid, Gilmer, Justin, Garg, Ankush, Firat, Orhan, Yeh, Chih-Kuan, Dai, Andrew M., Ghorbani, Behrooz
In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.
- North America > Canada > Ontario > Toronto (0.14)
- Asia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Xin, Derrick, Ghorbani, Behrooz, Garg, Ankush, Firat, Orhan, Gilmer, Justin
Recent research has proposed a series of specialized optimization algorithms for deep multi-task models. It is often claimed that these multi-task optimization (MTO) methods yield solutions that are superior to the ones found by simply optimizing a weighted average of the task losses. In this paper, we perform large-scale experiments on a variety of language and vision tasks to examine the empirical validity of these claims. We show that, despite the added design and computational complexity of these algorithms, MTO methods do not yield any performance improvements beyond what is achievable via traditional optimization approaches. We highlight alternative strategies that consistently yield improvements to the performance profile and point out common training pitfalls that might cause suboptimal results. Finally, we outline challenges in reliably evaluating the performance of MTO algorithms and discuss potential solutions.
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)